Efficient Single Step Traversals in Main-Memory Graph-Shaped Data
نویسندگان
چکیده
Management of graph-shaped data gained a momentum to both industry and research. Traversal queries through a graph-shaped dataset are easy to express, and can be efficiently executed using graph databases. High-performance traversals through graphshaped data is claimed to be enabled by native graph storage (i.e., encoding data using graph data structures), and native graph processing (i.e., operating on data with graphdomain specific operations). A common belief is that native graph storage databases are inherently superior to non-native graph storage databases (e.g., relational databases) in terms of traversal efficiency. This claim is especially supported by graph database vendors, but not yet proven or disproven objectively. In this work, we study in context of main-memory systems how the primitives of arbitrary traversal algorithms (i.e., single step traversal queries) are affected by native graph storage, and non-native graph storage in terms of execution performance. We focus on single step traversal queries that address navigation in graph-shaped data. We compare classic graph encoding and a state-of-the-art graph database micro-index as representatives of native graph storage, and table scanning and indexing by several binary search trees as representatives of non-native graph storage. We evaluate the representatives for native and non-native graph storage on both artificial datasets, and real world graph datasets. To be aware of confounding variables, we implement a unified main-memory-only experimental query engine to avoid bias from internal behavior of some blackbox systems (e.g., main-memory systems vs. disk-based systems). Our experimental results show that high efficient traversal algorithm in main-memory systems require indexing adjacent records and incident relationships rather than the property of being a native graph storage or a non-native graph storage.
منابع مشابه
Towards Efficient Graph Traversal using a Multi-GPU Cluster
Graph processing has always been a challenge, as there are inherent complexities in it. These include scalability to larger data sets and clusters, dependencies between vertices in the graph, irregular memory accesses during processing and traversals, minimal locality of reference, etc. In literature, there are several implementations for parallel graph processing on single GPU systems but only...
متن کاملHighspeed Graph Processing Exploiting Main-Memory Column Stores
A popular belief in the graph database community is that relational database management systems are generally ill-suited for efficient graph processing. This might apply for analytic graph queries performing iterative computations on the graph, but does not necessarily hold true for short-running, OLTP-style graph queries. In this paper we argue that, instead of extending a graph database manag...
متن کاملAutomatic Algorithm Transformation for Efficient Multi-Snapshot Analytics on Temporal Graphs
Analytical graph algorithms commonly compute metrics for a graph at one point in time. In practice it is often also of interest how metrics change over time, e.g., to find trends. For this purpose, algorithms must be executed for multiple graph snapshots. We present Single Algorithm Multiple Snapshots (SAMS), a novel approach to execute algorithms concurrently for multiple graph snapshots. SAMS...
متن کاملExternal Memory Algorithms For Path Traversal in Graphs
This thesis will present a number of results related to path traversal in trees and graphs. In particular, we focus on data structures which allow such traversals to be performed efficiently in the external memory setting. In addition, for trees and planar graphs the data structures we present are succinct. Our tree structures permit efficient bottom-up path traversal in rooted trees of arbitra...
متن کاملMemory-aware tree traversals with pre-assigned tasks
We study the complexity of traversing tree-shaped workflows whose tasks require large I/O files. We target a heterogeneous architecture with two resource types, each with a different memory, such as a multicore node equipped with a dedicated accelerator (FPGA or GPU). The tasks in the workflow are colored according to their type and can be processed if all there input and output files can be st...
متن کامل